Focus on Client-side Errors in a Monitoring System

Learn what client-side errors are and their impact on the service.

Client-side errors#

In a distributed system, clients often access the service via an HTTP request. We can monitor our web and application servers’ logs if a request fails to process. If multiple requests fail, we can observe a spike in internal errors (error 500).

Those errors whose root cause is on the client side are hard to respond to because the service has little to no insight into the client’s system. We might try to look for a dip in the load compared to averages, but such a graph is usually hard. It can have false positives and false negatives due to factors such as unexpectedly variable load or if a small portion of the client population is affected.

There are many factors that can cause failures that can result in clients being unable to reach the server. These include the following:

  • Failure in DNS name resolution.
  • Any failure in routing along the path from the client to the service provider.
  • Any failures with third-party infrastructure, such as middleboxes and content delivery networks (CDNs).
HTTP request
HTTP request
(https://example.com)
(https://example.com)
STOP
STOP
Client-side errors
Client-side errors
Stop
Stop
It's hard to consistently detect problems from



 traffic dips
It's hard to consistently detect problems from...
Web browser
Web browser
Web server
Web server
Server-side errors
Server-side errors
HTTP request
HTTP request
(https://example.com)
(https://example.com)
HTTP 500
HTTP 500
Internal server error
Internal server error
It's easy to detect an increase in these errors
It's easy to detect an increase in these errors
Web browser
Web browser
Web server
Web server
Viewer does not support full SVG 1.1
Server-side errors versus client-side errors

Failures due to a routing bug#

Let’s look at a real-world example of an error that impacted a large number of service customers, but the service wasn’t readily aware of it.

One of Google’s peer ISPs accidentally announced Internet routes that it wasn’t supposed to. As a result, the traffic of many of Google’s customers started routing through unintended ISPs and wasn’t reaching Google. Clients were frustrated because they weren’t able to reach Google, while Google might have been unaware of such problems right away because these issues didn’t happen on its infrastructure.

We can learn more about this event by clicking here.

BGP Leak on Nov 12, 2018
BGP Leak on Nov 12, 2018
LEAKED
LEAKED
216.58.192.0/22
216.58.192.0/22
216.58.192.0/22
216.58.192.0/22
216.58.192.0/19
216.58.192.0/19
Google



AS 15169
Google...
MainONE



AS 37282
MainONE...
Cogent



AS 174
Cogent...
Trans telecom



AS 20485
Trans telecom...
China Telecom



AS 4809
China Telecom...
Several other 



networks
Several other...
Several other 



networks
Several other...
Several other 



networks
Several other...
Charter (Many ASNs)
Charter (Many AS...
Verizon



Wireless



AS 22394
Verizon...
STOP
STOP
Legit advertisement 
Legit advertisement 
Leaked advertisements 
Leaked advertisements 
Client traffic
Client traffic
Viewer does not support full SVG 1.1
BGP leak

The above leak isn’t unique. Similar issues keep arising. Another such leakage happened on April 16, 2021, when an AS mistakenly announced over 30,000 BGP prefixes. This resulted in a 13 times spike in the inbound traffic to their network. However, an increase in influx was observed, and the problem was solved.

The impacted services’ monitoring systems might not catch the above events readily. Monitoring such situations is crucial so that the application remains available for all of its customers. Therefore, in the next lessons, we’ll go through methods that help us to monitor the situations mentioned above.

Visualize Data in a Monitoring System
Design of a Client-side Monitoring System
Completed